predictor variable
MLCBART: Multilabel Classification with Bayesian Additive Regression Trees
Tian, Jiahao, Chipman, Hugh, Loughin, Thomas
Multilabel Classification (MLC) deals with the simultaneous classification of multiple binary labels. The task is challenging because, not only may there be arbitrarily different and complex relationships between predictor variables and each label, but associations among labels may exist even after accounting for effects of predictor variables. In this paper, we present a Bayesian additive regression tree (BART) framework to model the problem. BART is a nonparametric and flexible model structure capable of uncovering complex relationships within the data. Our adaptation, MLCBART, assumes that labels arise from thresholding an underlying numeric scale, where a multivariate normal model allows explicit estimation of the correlation structure among labels. This enables the discovery of complicated relationships in various forms and improves MLC predictive performance. Our Bayesian framework not only enables uncertainty quantification for each predicted label, but our MCMC draws produce an estimated conditional probability distribution of label combinations for any predictor values. Simulation experiments demonstrate the effectiveness of the proposed model by comparing its performance with a set of models, including the oracle model with the correct functional form. Results show that our model predicts vectors of labels more accurately than other contenders and its performance is close to the oracle model. An example highlights how the method's ability to produce measures of uncertainty on predictions provides nuanced understanding of classification results.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
- North America > United States > Michigan (0.04)
- Europe > France (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Modeling & Simulation (0.93)
- Information Technology > Data Science > Data Mining (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- North America > United States > New Jersey > Hudson County > Secaucus (0.04)
- (2 more...)
85fc37b18c57097425b52fc7afbb6969-AuthorFeedback.pdf
We thank each reviewer for their insightful and constructive feedback. It will surely improve the manuscript. In Figure 1, we show the outcome of one such experiment. CART trees can be carried over to random forests--one that goes beyond the ensemble principle. Thus, one can use the node diameters as a proxy for the approximation error. Reviewer #2 mentioned the reference [Nobel, 2002].
A Machine Learning-Based Framework to Shorten the Questionnaire for Assessing Autism Intervention
Dong, Audrey, Xu, Claire, Guo, Samuel R., Yang, Kevin, Kong, Xue-Jun
Caregivers of individuals with autism spectrum disorder (ASD) often find the 77-item Autism Treatment Evaluation Checklist (ATEC) burdensome, limiting its use for routine monitoring. This study introduces a generalizable machine learning framework that seeks to shorten assessments while maintaining evaluative accuracy. Using longitudinal ATEC data from 60 autistic children receiving therapy, we applied feature selection and cross-validation techniques to identify the most predictive items across two assessment goals: longitudinal therapy tracking and point-in-time severity estimation. For progress monitoring, the framework identified 16 items (21% of the original questionnaire) that retained strong correlation with total score change and full subdomain coverage. We also generated smaller subsets (1-7 items) for efficient approximations. For point-in-time severity assessment, our model achieved over 80% classification accuracy using just 13 items (17% of the original set). While demonstrated on ATEC, the methodology-based on subset optimization, model interpretability, and statistical rigor-is broadly applicable to other high-dimensional psychometric tools. The resulting framework could potentially enable more accessible, frequent, and scalable assessments and offer a data-driven approach for AI-supported interventions across neurodevelopmental and psychiatric contexts.
- North America > United States > Texas > Travis County > Austin (0.04)
- Africa > Sudan (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Enhancing Bankruptcy Prediction of Banks through Advanced Machine Learning Techniques: An Innovative Approach and Analysis
Rustam, Zuherman, Hartini, Sri, Islam, Sardar M. N., Novkaniza, Fevi, Aszhari, Fiftitah R., Rifqi, Muhammad
Context: Financial system stability is determined by the condition of the banking system. A bank failure can destroy the stability of the financial system, as banks are subject to systemic risk, affecting not only individual banks but also segments or the entire financial system. Calculating the probability of a bank going bankrupt is one way to ensure the banking system is safe and sound. Existing literature and limitations: Statistical models, such as Altman's Z-Score, are one of the common techniques for developing a bankruptcy prediction model. However, statistical methods rely on rigid and sometimes irrelevant assumptions, which can result in low forecast accuracy. New approaches are necessary. Objective of the research: Bankruptcy models are developed using machine learning techniques, such as logistic regression (LR), random forest (RF), and support vector machines (SVM). According to several studies, machine learning is also more accurate and effective than statistical methods for categorising and forecasting banking risk management. Present Research: The commercial bank data are derived from the annual financial statements of 44 active banks and 21 bankrupt banks in Turkey from 1994 to 2004, and the rural bank data are derived from the quarterly financial reports of 43 active and 43 bankrupt rural banks in Indonesia between 2013 and 2019. Five rural banks in Indonesia have also been selected to demonstrate the feasibility of analysing bank bankruptcy trends. Findings and implications: The results of the research experiments show that RF can forecast data from commercial banks with a 90% accuracy rate. Furthermore, the three machine learning methods proposed accurately predict the likelihood of rural bank bankruptcy. Contribution and Conclusion: The proposed innovative machine learning approach help to implement policies that reduce the costs of bankruptcy.
- Asia > Middle East > Republic of Türkiye (0.26)
- Europe > United Kingdom (0.04)
- North America > United States > New York (0.04)
- (9 more...)
- Banking & Finance > Financial Services (0.49)
- Information Technology > Security & Privacy (0.48)
Artificial Intelligence for Pediatric Height Prediction Using Large-Scale Longitudinal Body Composition Data
Chun, Dohyun, Jung, Hae Woon, Kang, Jongho, Jang, Woo Young, Kim, Jihun
Height g rowth serves as a key health indicator, reflecting the interplay of genetic, environmental, and socioeconomic factors (Norris et al., 2022; Baxter - Jones et al., 2011; Hargreaves et al., 2022). Monitoring height growth enables early detection of disorders, facilitating timely interventions (Saari et al., 2015; Craig et al., 2011; Grote et al., 2008; Zhang et al., 2016). Accurate future height prediction is essential for diagnosing growth disorders, initiating hormone therapy, and evaluating treatment efficacy (Collett - Solberg et al., 2019; Ostojic, 2013; Cuttler & Silvers, 2004). Traditional height prediction methods rely on skeletal maturity assessment using hand - wrist radiographs. These include the Bayley - Pinneau (Bayley and Pinneau, 1952), Tanner - Whitehouse (Tanner et al., 1975), and Roche - Wainer - Thissen (Roche et al., 1975) met hods. However, these approaches have limitations including radia tion exposure, the need for specialized expertise, and high interobserver variability (Bull et al., 1999; Chávez - Vázquez et al., 2024; Prokop - Piotrkowska et al., 2021).
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > New York (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (3 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.66)
- Health & Medicine > Diagnostic Medicine > Imaging (0.48)
Using Artificial Intelligence to Improve Classroom Learning Experience
Shadeeb Hossain Engineering Technology and Information Sciences DeVry University New York, USA [ORCID ID: 0000 - 0002 - 5224 - 7684 ] Abstract -- This paper explores advancements in Artificial Intelligence (AI) technologies to enhance classroom learning, highlighting contributions from companies like IBM, Microsoft, Google, and ChatGPT, as well as the potential of brain signal analysis. The focus is on improving students' learning experiences by using Machine Learning (ML) algorithms to (i) identify a student's preferred learning style (visual or auditory) and (ii) predict academic dropout risk. A Logistic Regression algorithm is applied for binary classification using six predictor variables, such as assessment scores, lesson duration, and preferred learning style, to accurately identify learning preferences. In comparison, the Stochastic Gradient Descent (SGD) classifier achieved an accuracy of 83.1% on the same dataset Individual feedback to students and customized learning materials has a significant impact on their learning ability and have been areas of active research focus [1]. However, in the United States, due to the vast diversity in classroom populations, it becomes inherently difficult for educators to customize lessons and address individual students' problems [2]. V arious factors contribute to the effectiveness of individual learning processes [3,4]. Questionnaires have often been used as a tool to predict an individual's learning style [5 - 8]. Learning analytics, which involves the collection, analysis, and use of da ta, has been suggested to improve students' learning experiences [9]. In most cases, these assessments have been used to generalize the overall learning patterns of a classroom rather than addressing the needs of individual students. The concept of a SMART classroom incorporates both hardware and software components to adapt to dynamic learning patterns in a classroom, and it has been an area of ongoing research [10,11].
- Instructional Material (1.00)
- Research Report > New Finding (0.88)
- Education > Educational Setting (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.47)
- Health & Medicine > Therapeutic Area > Neurology (0.34)
An Explainable Pipeline for Machine Learning with Functional Data
Goode, Katherine, Tucker, J. Derek, Ries, Daniel, Hofmann, Heike
Machine learning (ML) models have shown success in applications with an objective of prediction, but the algorithmic complexity of some models makes them difficult to interpret. Methods have been proposed to provide insight into these "black-box" models, but there is little research that focuses on supervised ML when the model inputs are functional data. In this work, we consider two applications from high-consequence spaces with objectives of making predictions using functional data inputs. One application aims to classify material types to identify explosive materials given hyperspectral computed tomography scans of the materials. The other application considers the forensics science task of connecting an inkjet printed document to the source printer using color signatures extracted by Raman spectroscopy. An instinctive route to consider for analyzing these data is a data driven ML model for classification, but due to the high consequence nature of the applications, we argue it is important to appropriately account for the nature of the data in the analysis to not obscure or misrepresent patterns. As such, we propose the Variable importance Explainable Elastic Shape Analysis (VEESA) pipeline for training ML models with functional data that (1) accounts for the vertical and horizontal variability in the functional data and (2) provides an explanation in the original data space of how the model uses variability in the functional data for prediction. The pipeline makes use of elastic functional principal components analysis (efPCA) to generate uncorrelated model inputs and permutation feature importance (PFI) to identify the principal components important for prediction. The variability captured by the important principal components in visualized the original data space. We ultimately discuss ideas for natural extensions of the VEESA pipeline and challenges for future research.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Nebraska > Lancaster County > Lincoln (0.04)
- Research Report (1.00)
- Overview (0.67)
- Energy (0.93)
- Government > Regional Government > North America Government > United States Government (0.93)
- Health & Medicine > Therapeutic Area (0.92)